The Objecitve of this analysis is to practise some data anaysis using python programming language on world bank's data on Nigeria. This will involve:
Nigeria, an African country on the Gulf of Guinea, has many natural landmarks and wildlife reserves. Protected areas such as Cross River National Park and Yankari National Park have waterfalls, dense rainforest, savanna and rare primate habitats. One of the most recognizable sitmes is Zuma Rock, a 725m-tall monolith outside the capital of Abuja that’s pictured on the national currency.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# load datadet. source: Word Bank Website (https://data.worldbank.org/country/NG)
df=pd.read_csv('API_NGA_DS2_en_csv_v2_5455596.csv')
df.head()
| Data Source | World Development Indicators | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Unnamed: 9 | ... | Unnamed: 57 | Unnamed: 58 | Unnamed: 59 | Unnamed: 60 | Unnamed: 61 | Unnamed: 62 | Unnamed: 63 | Unnamed: 64 | Unnamed: 65 | Unnamed: 66 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | Last Updated Date | 5/10/2023 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | Country Name | Country Code | Indicator Name | Indicator Code | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | ... | 2013.0 | 2014.0 | 2015.0 | 2016.000000 | 2017.0 | 2018.0 | 2019.000000 | 2020.0 | 2021.0 | 2022.0 |
| 4 | Nigeria | NGA | Intentional homicides (per 100,000 people) | VC.IHR.PSRC.P5 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 33.604193 | NaN | NaN | 21.740789 | NaN | NaN | NaN |
5 rows × 67 columns
# Lets make a copy of the original dataset
df_copy =df.copy()
# Lets drop the first 3 rows and create a copy of the dataset
df_copy=df_copy.drop([0,1,2], axis =0)
# Check the result after dropping first 3 rows
df_copy.head()
| Data Source | World Development Indicators | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Unnamed: 9 | ... | Unnamed: 57 | Unnamed: 58 | Unnamed: 59 | Unnamed: 60 | Unnamed: 61 | Unnamed: 62 | Unnamed: 63 | Unnamed: 64 | Unnamed: 65 | Unnamed: 66 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Country Name | Country Code | Indicator Name | Indicator Code | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | ... | 2013.000000 | 2014.000000 | 2015.000000 | 2016.000000 | 2017.000000 | 2018.000000 | 2019.000000 | 2020.000000 | 2021.000000 | 2022.0 |
| 4 | Nigeria | NGA | Intentional homicides (per 100,000 people) | VC.IHR.PSRC.P5 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 33.604193 | NaN | NaN | 21.740789 | NaN | NaN | NaN |
| 5 | Nigeria | NGA | Internally displaced persons, new displacement... | VC.IDP.NWDS | NaN | NaN | NaN | NaN | NaN | NaN | ... | 117000.000000 | 3000.000000 | 100000.000000 | 78000.000000 | 122000.000000 | 613000.000000 | 157000.000000 | 279000.000000 | 24000.000000 | NaN |
| 6 | Nigeria | NGA | Voice and Accountability: Percentile Rank, Upp... | VA.PER.RNK.UPPER | NaN | NaN | NaN | NaN | NaN | NaN | ... | 30.516432 | 34.482758 | 40.886700 | 41.871922 | 41.379311 | 37.198067 | 37.681160 | 34.782608 | 34.299519 | NaN |
| 7 | Nigeria | NGA | Voice and Accountability: Estimate | VA.EST | NaN | NaN | NaN | NaN | NaN | NaN | ... | -0.693028 | -0.587156 | -0.372614 | -0.319363 | -0.339919 | -0.430503 | -0.434371 | -0.580638 | -0.636556 | NaN |
5 rows × 67 columns
We can observe from the above table that we dont have the desired column headers. The preferred header title can be found in row 3. We will need to take steps to convert row (index) 3 to column header
# Covert row 3 to column headers
df_copy.columns=df_copy.iloc[0]
# Checking result
df_copy.head(2)
| 3 | Country Name | Country Code | Indicator Name | Indicator Code | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | ... | 2013.0 | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | 2019.0 | 2020.0 | 2021.0 | 2022.0 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Country Name | Country Code | Indicator Name | Indicator Code | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | ... | 2013.0 | 2014.0 | 2015.0 | 2016.000000 | 2017.0 | 2018.0 | 2019.000000 | 2020.0 | 2021.0 | 2022.0 |
| 4 | Nigeria | NGA | Intentional homicides (per 100,000 people) | VC.IHR.PSRC.P5 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 33.604193 | NaN | NaN | 21.740789 | NaN | NaN | NaN |
2 rows × 67 columns
# drop redundant columns and save in new variable
df_copy= df_copy.drop(['Country Name','Country Code','Indicator Code'], axis=1)
# Check the result
df_copy.head(2)
| 3 | Indicator Name | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | 1966.0 | 1967.0 | 1968.0 | ... | 2013.0 | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | 2019.0 | 2020.0 | 2021.0 | 2022.0 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Indicator Name | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | 1966.0 | 1967.0 | 1968.0 | ... | 2013.0 | 2014.0 | 2015.0 | 2016.000000 | 2017.0 | 2018.0 | 2019.000000 | 2020.0 | 2021.0 | 2022.0 |
| 4 | Intentional homicides (per 100,000 people) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | 33.604193 | NaN | NaN | 21.740789 | NaN | NaN | NaN |
2 rows × 64 columns
#TTranspose Indicator Name rows to columns
df_copy.set_index('Indicator Name').T.head(2).copy()
| Indicator Name | Indicator Name | Intentional homicides (per 100,000 people) | Internally displaced persons, new displacement associated with disasters (number of cases) | Voice and Accountability: Percentile Rank, Upper Bound of 90% Confidence Interval | Voice and Accountability: Estimate | High-technology exports (current US$) | Merchandise exports to low- and middle-income economies within region (% of total merchandise exports) | Merchandise exports to low- and middle-income economies in South Asia (% of total merchandise exports) | Merchandise exports to low- and middle-income economies in East Asia & Pacific (% of total merchandise exports) | Merchandise exports to economies in the Arab World (% of total merchandise exports) | ... | School enrollment, primary, female (% gross) | Primary education, pupils | Educational attainment, at least completed primary, population 25+ years, female (%) (cumulative) | Primary school starting age (years) | School enrollment, preprimary, male (% gross) | Preprimary education, duration (years) | School enrollment, primary (gross), gender parity index (GPI) | Literacy rate, adult female (% of females ages 15 and above) | Literacy rate, youth female (% of females ages 15-24) | Regulatory Quality: Percentile Rank |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | |||||||||||||||||||||
| 1960.0 | 1960.0 | NaN | NaN | NaN | NaN | NaN | 0.692941 | 0.303162 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961.0 | 1961.0 | NaN | NaN | NaN | NaN | NaN | 0.864375 | 0.349866 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 rows × 1479 columns
# Drop row with index number 3 from the dataset since it has been used has colum headers
df_copy2=df_copy.drop([3], axis =0).copy()
df_copy2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1478 entries, 4 to 1481 Data columns (total 64 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Indicator Name 1478 non-null object 1 1960.0 170 non-null float64 2 1961.0 195 non-null float64 3 1962.0 215 non-null float64 4 1963.0 232 non-null float64 5 1964.0 234 non-null float64 6 1965.0 239 non-null float64 7 1966.0 235 non-null float64 8 1967.0 238 non-null float64 9 1968.0 243 non-null float64 10 1969.0 244 non-null float64 11 1970.0 341 non-null float64 12 1971.0 357 non-null float64 13 1972.0 363 non-null float64 14 1973.0 356 non-null float64 15 1974.0 340 non-null float64 16 1975.0 345 non-null float64 17 1976.0 350 non-null float64 18 1977.0 422 non-null float64 19 1978.0 418 non-null float64 20 1979.0 415 non-null float64 21 1980.0 400 non-null float64 22 1981.0 527 non-null float64 23 1982.0 538 non-null float64 24 1983.0 550 non-null float64 25 1984.0 544 non-null float64 26 1985.0 559 non-null float64 27 1986.0 560 non-null float64 28 1987.0 541 non-null float64 29 1988.0 552 non-null float64 30 1989.0 569 non-null float64 31 1990.0 697 non-null float64 32 1991.0 685 non-null float64 33 1992.0 702 non-null float64 34 1993.0 678 non-null float64 35 1994.0 674 non-null float64 36 1995.0 707 non-null float64 37 1996.0 759 non-null float64 38 1997.0 702 non-null float64 39 1998.0 737 non-null float64 40 1999.0 766 non-null float64 41 2000.0 878 non-null float64 42 2001.0 796 non-null float64 43 2002.0 832 non-null float64 44 2003.0 919 non-null float64 45 2004.0 846 non-null float64 46 2005.0 919 non-null float64 47 2006.0 954 non-null float64 48 2007.0 940 non-null float64 49 2008.0 964 non-null float64 50 2009.0 917 non-null float64 51 2010.0 1028 non-null float64 52 2011.0 992 non-null float64 53 2012.0 928 non-null float64 54 2013.0 992 non-null float64 55 2014.0 989 non-null float64 56 2015.0 999 non-null float64 57 2016.0 981 non-null float64 58 2017.0 920 non-null float64 59 2018.0 1027 non-null float64 60 2019.0 919 non-null float64 61 2020.0 836 non-null float64 62 2021.0 656 non-null float64 63 2022.0 54 non-null float64 dtypes: float64(63), object(1) memory usage: 739.1+ KB
Here we will try to use the indicator names as column headers and the years column as row index
# Select only column header titled indicator name
df_indicator_name=df_copy2['Indicator Name']
# Slect all date columns without indicator name
df_date_series = df_copy2.iloc[:,1:]
df_indicator_name= df_indicator_name.reindex()
df_indicator_name=pd.DataFrame(df_indicator_name)
# This is to create a new dataframe for ease of transposition with new index
df_combined=pd.concat([df_indicator_name, df_date_series], ignore_index=True)
df_combined.head(3)
| Indicator Name | 1960.0 | 1961.0 | 1962.0 | 1963.0 | 1964.0 | 1965.0 | 1966.0 | 1967.0 | 1968.0 | ... | 2013.0 | 2014.0 | 2015.0 | 2016.0 | 2017.0 | 2018.0 | 2019.0 | 2020.0 | 2021.0 | 2022.0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Intentional homicides (per 100,000 people) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | Internally displaced persons, new displacement... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | Voice and Accountability: Percentile Rank, Upp... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 rows × 64 columns
# Generate a list from the values in indicator name column and save to a list
df_list = df_combined['Indicator Name'].drop_duplicates().to_list()
# Lets check the length of the list to ensure it will fit into the new dataframe we want to create
len(df_list)
1479
# Here we just want to create a sliced dataframe of only the date columns and reset the index
date_indexed =pd.DataFrame(df_date_series.T).reset_index()
date_indexed.head(3)
| 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ... | 1472 | 1473 | 1474 | 1475 | 1476 | 1477 | 1478 | 1479 | 1480 | 1481 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1960.0 | NaN | NaN | NaN | NaN | NaN | 0.692941 | 0.303162 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 1961.0 | NaN | NaN | NaN | NaN | NaN | 0.864375 | 0.349866 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 1962.0 | NaN | NaN | NaN | NaN | NaN | 2.906853 | 0.084872 | NaN | 1.31551 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 rows × 1479 columns
# Create a copy of the year column and insert in the dataset
date_indexed['year']=date_indexed[3]
# now lets replace the index column to maintain the size of the dataframe
df_cols=date_indexed.drop([3], axis=1)
df_cols.head(3)
| 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | ... | 1473 | 1474 | 1475 | 1476 | 1477 | 1478 | 1479 | 1480 | 1481 | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | 0.692941 | 0.303162 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1960.0 |
| 1 | NaN | NaN | NaN | NaN | NaN | 0.864375 | 0.349866 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1961.0 |
| 2 | NaN | NaN | NaN | NaN | NaN | 2.906853 | 0.084872 | NaN | 1.31551 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1962.0 |
3 rows × 1479 columns
# Rename the column headers with indicator values and check if it fits the date column headers replaced
df_cols.columns=df_list
# This is to replace the Column header as Year instead of NaN
df_cols.columns =df_cols.columns.fillna("Year")
df_cols.head(2)
| Intentional homicides (per 100,000 people) | Internally displaced persons, new displacement associated with disasters (number of cases) | Voice and Accountability: Percentile Rank, Upper Bound of 90% Confidence Interval | Voice and Accountability: Estimate | High-technology exports (current US$) | Merchandise exports to low- and middle-income economies within region (% of total merchandise exports) | Merchandise exports to low- and middle-income economies in South Asia (% of total merchandise exports) | Merchandise exports to low- and middle-income economies in East Asia & Pacific (% of total merchandise exports) | Merchandise exports to economies in the Arab World (% of total merchandise exports) | ICT goods exports (% of total goods exports) | ... | Primary education, pupils | Educational attainment, at least completed primary, population 25+ years, female (%) (cumulative) | Primary school starting age (years) | School enrollment, preprimary, male (% gross) | Preprimary education, duration (years) | School enrollment, primary (gross), gender parity index (GPI) | Literacy rate, adult female (% of females ages 15 and above) | Literacy rate, youth female (% of females ages 15-24) | Regulatory Quality: Percentile Rank | Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | 0.692941 | 0.303162 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1960.0 |
| 1 | NaN | NaN | NaN | NaN | NaN | 0.864375 | 0.349866 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1961.0 |
2 rows × 1479 columns
# Change the year column from float data to integer
df_cols['Year']=df_cols['Year'].astype('int', copy=True)
df_cols.head(3)
| Intentional homicides (per 100,000 people) | Internally displaced persons, new displacement associated with disasters (number of cases) | Voice and Accountability: Percentile Rank, Upper Bound of 90% Confidence Interval | Voice and Accountability: Estimate | High-technology exports (current US$) | Merchandise exports to low- and middle-income economies within region (% of total merchandise exports) | Merchandise exports to low- and middle-income economies in South Asia (% of total merchandise exports) | Merchandise exports to low- and middle-income economies in East Asia & Pacific (% of total merchandise exports) | Merchandise exports to economies in the Arab World (% of total merchandise exports) | ICT goods exports (% of total goods exports) | ... | Primary education, pupils | Educational attainment, at least completed primary, population 25+ years, female (%) (cumulative) | Primary school starting age (years) | School enrollment, preprimary, male (% gross) | Preprimary education, duration (years) | School enrollment, primary (gross), gender parity index (GPI) | Literacy rate, adult female (% of females ages 15 and above) | Literacy rate, youth female (% of females ages 15-24) | Regulatory Quality: Percentile Rank | Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | 0.692941 | 0.303162 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1960 |
| 1 | NaN | NaN | NaN | NaN | NaN | 0.864375 | 0.349866 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1961 |
| 2 | NaN | NaN | NaN | NaN | NaN | 2.906853 | 0.084872 | NaN | 1.31551 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1962 |
3 rows × 1479 columns
# Save the nw dataframe
df_new = df_cols
df_new = df_new.round(1)
# Lets set the year roows as index
df_new.index = df_new['Year']
# Check percentage of missing values and output as a dataframe
check_missing_values =pd.DataFrame((df_new.isnull().sum()/df_new.shape[0])*100).round(2).sort_values(by= 0, ascending=False)
check_missing_values=check_missing_values.set_axis(['Missing Values%'], axis=1)
check_missing_values.columns.rename('Indicator Name', inplace=True)
check_missing_values.head()
| Indicator Name | Missing Values% |
|---|---|
| Educational attainment, Doctoral or equivalent, population 25+, female (%) (cumulative) | 100.0 |
| Survey mean consumption or income per capita, bottom 40% of population (2017 PPP $ per day) | 100.0 |
| Customs and other import duties (current LCU) | 100.0 |
| Taxes on exports (% of tax revenue) | 100.0 |
| Social contributions (% of revenue) | 100.0 |
It can be observed from the above that some economic parameters or indicators have missing values of more than 50% of the 63 years of data enty . The novelity of the data may be responisble for this
We may have to drop or select datapoint with adequate entries for further analysis. But first lets try to indicators with 95-100% missing values
# lets use a tight threshold of 10% to capture more data
df_dropped_cols = df_new
for i in df_dropped_cols.columns:
if (df_dropped_cols[i].isnull().sum()/df_dropped_cols.shape[0])*100 >5.0:
del df_dropped_cols[i]
# Lets check the result under a new variable df_dropped
check_missing_values =pd.DataFrame((df_dropped_cols.isnull().sum()/df_dropped_cols.shape[0])*100).round(2).sort_values(by= 0, ascending=False)
check_missing_values=check_missing_values.set_axis(['Missing Values%'], axis=1)
check_missing_values.columns.rename('Indicator Name', inplace=True)
check_missing_values.head()
| Indicator Name | Missing Values% |
|---|---|
| Merchandise exports to low- and middle-income economies within region (% of total merchandise exports) | 4.76 |
| Renewable internal freshwater resources, total (billion cubic meters) | 4.76 |
| Arable land (hectares) | 4.76 |
| Merchandise exports by the reporting economy, residual (% of total merchandise exports) | 4.76 |
| Merchandise exports to high-income economies (% of total merchandise exports) | 4.76 |
Here we have now captured only indicators that have less than 5% missing values.We have also succeeded in trimming the size of the dataset to capture indicators that can give better information. We can now proceed to carry out our analysis
# Lets fill the missing values with zeros
df_filled=df_dropped_cols.fillna(0).copy()
# Check if the result after dealing with missing values
df_filled.isnull().sum().head()
Merchandise exports to low- and middle-income economies within region (% of total merchandise exports) 0 Merchandise imports from low- and middle-income economies in Sub-Saharan Africa (% of total merchandise imports) 0 Merchandise imports (current US$) 0 Urban population 0 Rural population 0 dtype: int64
plt.figure(figsize=(15,5))
ax=sns.barplot(data=df_filled, y= df_filled['Net migration'], x=df_filled['Year'])
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="right")
plt.tight_layout()
plt.show()
Nigeria's Net Migration has been negative. Net migration is the net total of migrants during the period, that is, the number of immigrants minus the number of emigrants, including both citizens and noncitizens. From the above chart we see a huge bar of negative Net Migration in 1984. This may be an outlier. More investigation is required to understand what was responsibe for the huge negative
According to the long defintion of the World Bank, GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.
df_new['GDP growth (annual %)'].mean().round(2)
3.68
df_new['GDP growth (annual %)'].max()
25.0
# Find The Year when
df_new['GDP growth (annual %)'].loc[df_new['GDP growth (annual %)']==25]
Year 1970 25.0 Name: GDP growth (annual %), dtype: float64
df_new['GDP growth (annual %)'].min()
-15.7
# This is to find the last non null or zero value and return the previous
x=len(df_filled['GDP growth (annual %)'])
for i in reversed(range(x)):
if df_filled['GDP growth (annual %)'].iloc[i]==0.0:
if df_filled['GDP growth (annual %)'].iloc[(i-1)]!=0:
print (df_filled[['GDP growth (annual %)','Year']].iloc[(i-1)] )
GDP growth (annual %) 3.6 Year 2021.0 Name: 2021, dtype: float64
plt.figure(figsize=(15,5))
ax=sns.barplot(data=df_new, y= df_filled['Trade (% of GDP)'], x=df_filled['Year'])
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="right")
plt.tight_layout()
plt.show()
# Create a dataframe selecting only 21 years of Data
df_10y = df_filled.query("Year >= 2000 and Year <2022")[['Inflation, consumer prices (annual %)', 'GDP growth (annual %)']]
# Check correlations
df_10y.corr()
| Inflation, consumer prices (annual %) | GDP growth (annual %) | |
|---|---|---|
| Inflation, consumer prices (annual %) | 1.000000 | -0.108584 |
| GDP growth (annual %) | -0.108584 | 1.000000 |
# Plot data
px.area(data_frame=df_10y)
Inflation has been trending upwards with Nigeria's slow GDP growth as seen from the charts. It could pose serious concern for the economy.
This analysis was mainly to show case data analytics without delving into the nuts and bolts of economic mechanics. Although it is worthy of note that some key economic indicators as presented by the World Bank data would suggest that the economy may be lagging behind its true potential particularly when revewing GDP growth Rate, Net Migration, Trade and inflation indicators. These areas I want to believe if worked on by the new administration can have psotive impact on the economy at large. Based on the dataset, there are a barrage of other economic factors to consider by the government which makes data analysis (with hthe aid of python programming language) of great importance